Axes (ggplot2)
Problem
You want to change the order or direction of the axes.
Solution
Note: In the examples below, where it says something like scale_y_continuous, scale_x_continuous, or ylim, the y can be replaced with x if you want to operate on the other axis.
This is the basic boxplot that we will work with, using the built-in PlantGrowth
data set.
library(ggplot2) bp <- ggplot(PlantGrowth, aes(x=group, y=weight)) + geom_boxplot() bp
Swapping X and Y axes
Swap x and y axes (make x vertical, y horizontal):
bp + coord_flip()
Discrete axis
Changing the order of items
# Manually set the order of a discrete-valued axis bp + scale_x_discrete(limits=c("trt1","trt2","ctrl")) # Reverse the order of a discrete-valued axis # Get the levels of the factor flevels <- levels(PlantGrowth$group) # "ctrl" "trt1" "trt2" # Reverse the order flevels <- rev(flevels) # "trt2" "trt1" "ctrl" bp + scale_x_discrete(limits=flevels) # Or it can be done in one line: bp + scale_x_discrete(limits = rev(levels(PlantGrowth$group)) )
Setting tick mark labels
For discrete variables, the tick mark labels are taken directly from levels of the factor. However, sometimes the factor levels have short names that aren't suitable for presentation.
bp + scale_x_discrete(breaks=c("ctrl", "trt1", "trt2"), labels=c("Control", "Treat 1", "Treat 2"))
# Hide x tick marks, labels, and grid lines bp + scale_x_discrete(breaks=NA) # Hide all tick marks and labels (on X axis), but keep the gridlines bp + opts(axis.ticks = theme_blank(), axis.text.x = theme_blank())
If you want to remove the grid lines but keep the tick marks and labels, it requires a bit of a hack.
Continuous axis
Setting range and reversing direction of an axis
Note that if any scale_y_continuous
command is used, it overrides any ylim
command, and the ylim
will be ignored.
# Set the range of a continuous-valued axis # These are equivalent bp + ylim(0,8) bp + scale_y_continuous(limits=c(0,8))
If the y range is reduced using the method above, the data outside the range is ignored. This might be OK for a scatterplot, but it can be problematic for the box plots used here. For bar graphs, if the range does not include 0, the bars will not show at all!
To avoid this problem, you can use coord_cartesian
instead. Instead of setting the limits of the data, it sets the viewing area of the data. This avoids one problem, but there is another problem: the tick markers may not be appropriate for the new range. This issue can be fixed by specifying them with scale_y_continuous
.
# These two do the same thing; all data points outside the graphing range are dropped, # resulting in a misleading box plot bp + ylim(5,7.5) bp + scale_y_continuous(limits=c(5,7.5)) # Using coor_cartesian "zooms" into the area but the tick marks might not # be right because they are set for the "natural" window bp + coord_cartesian(ylim=c(5,7.5)) # Specify tick marks directly bp + coord_cartesian(ylim=c(5,7.5)) + scale_y_continuous(breaks=seq(0, 10, 0.5)) # Ticks from 0-10, every .5
Reversing the direction of an axis
# Reverse order of a continuous-valued axis bp + ylim(max(PlantGrowth$weight),min(PlantGrowth$weight))
Setting and hiding tick markers
# Setting the tick marks on an axis # This will show tick marks on every 0.25 from 1 to 10 # The scale will show only the ones that are within range (3.50-6.25 in this case) bp + scale_y_continuous(breaks=seq(1,10,1/4)) # The breaks can be different sizes bp + scale_y_continuous(breaks=c(4, 4.25, 4.5, 5, 6,8)) # Suppress tick marks bp + scale_y_continuous(breaks=NA) # Hide tick marks and labels (on Y axis), but keep the gridlines bp + opts(axis.ticks = theme_blank(), axis.text.y = theme_blank())
Scaling axes
By default, the axes are linearly scaled. It is possible to scale the axes with log, power, roots, and so on.
# Create some noisy exponentially-distributed data set.seed(204) n <- 100 df <- data.frame(xval = 2^((1:n+rnorm(n,sd=5))/20), yval = 2*2^((1:n+rnorm(n,sd=5))/20)) # A scatterplot with regular (linear) axis scaling sp <- ggplot(df, aes(xval, yval)) + geom_point() sp # log2 scaling of the axes, with visually-equal spacing sp + scale_x_log2() + scale_y_log2() # log2 transformation of axes, with visually-diminishing spacing sp + coord_trans(x="log2", y="log2")
Other scaling methods include the following. These can be used with coord_trans
by using just the last portion of the name:
scale_x_asn | scale_x_log10 | scale_x_prob |
scale_x_atanh | scale_x_log2 | scale_x_probit |
scale_x_exp | scale_x_logit | scale_x_reverse |
scale_x_inverse | scale_x_pow | scale_x_sqrt |
scale_x_log | scale_x_pow10 |
It is also possible to force the scaling of the axes to be equal, with one visual unit being representing the same numeric unit on both axes. When the scales are equal, it is also possible to control the relative length of the axes; for example, so that one axis is 3 times longer than the other.
# Force equal scaling sp + coord_equal() # Equal scaling, with y axis 3 times longer than x axis sp + coord_equal(ratio=3)
Axis labels and text formatting
To set and hide the axis labels:
bp + opts(axis.title.x = theme_blank()) + # Remove x-axis label ylab("Weight (Kg)") # Set y-axis label
To change the fonts, and rotate tick mark labels:
# Change font options: # X-axis label: bold, red, and 20 points # X-axis tick marks: rotate 90 degrees CCW, move to the left a bit (hjust), and 16 points bp + opts(axis.title.x = theme_text(face="bold", colour="#990000", size=20), axis.text.x = theme_text(angle=90, hjust=1.2, size=16))
Tick mark label text formatters
You may want to display your values as percents, or dollars, or in scientific notation. To do this you can use a formatter:
# Label formatters bp + scale_y_continuous(formatter="percent") + scale_x_discrete(formatter="abbreviate") # In this particular case, it has no effect
(Note: In the upcoming version of ggplot2, 0.9.0, the parameter is labels
instead of formatter
.)
Other useful formatters for continuous scales include percent and scientific. For discrete scales, abbreviate will remove vowels and spaces and shorten to four characters.
Sometimes you may need to create your own formatting function. This one will display numeric minutes in HH:MM:SS format.
# Self-defined formatting function for times. timeHMS_formatter <- function(x) { h <- floor(x/60) m <- floor(x %% 60) s <- round(60*(x %% 1)) # Round to nearest second lab <- sprintf('%02d:%02d:%02d', h, m, s) # Format the strings as HH:MM:SS lab <- gsub('^00:', '', lab) # Remove leading 00: if present lab <- gsub('^0', '', lab) # Remove leading 0 if present } bp + scale_y_continuous(formatter=timeHMS_formatter)
Hiding gridlines
To hide all gridlines, both vertical and horizontal:
# Hide all the gridlines bp + opts(panel.grid.minor=theme_blank(), panel.grid.major=theme_blank()) # Hide just the minor gridlines bp + opts(panel.grid.minor=theme_blank())
Hiding only horizontal or vertical gridlines
Hiding just the vertical or horizontal gridlines with ggplot2 requires a bit of a hack. An internal function called guide_grid
must be redefined to not draw the vertical or horizontal grid lines.
# Save the original definition of the guide_grid guide_grid_orig <- guide_grid # Create the replacement function guide_grid_no_vline <- function(theme, x.minor, x.major, y.minor, y.major) { ggname("grill", grobTree( theme_render(theme, "panel.background"), theme_render( theme, "panel.grid.minor", name = "y", x = rep(0:1, length(y.minor)), y = rep(y.minor, each=2), id.lengths = rep(2, length(y.minor)) ), theme_render( theme, "panel.grid.major", name = "y", x = rep(0:1, length(y.major)), y = rep(y.major, each=2), id.lengths = rep(2, length(y.major)) ) )) } # Assign the function inside ggplot2 assignInNamespace("guide_grid", guide_grid_no_vline, pos="package:ggplot2") # Draw the graph bp # Restore the original guide_grid function so that it will draw all gridlines again assignInNamespace("guide_grid", guide_grid_orig, pos="package:ggplot2")
This will hide vertical grid lines -- even if the x and y axes have been swapped with coord_flip()
, it will still hide the vertical lines.
Unfortunately, the replacement guide_grid
does not get stored as part of the definition of the plot -- the call to assignInNamespace()
must be done just before outputting the plot. This can be annoying if you define the plots in one place and output them in another.
This will hide horizontal grid lines:
# Save the original definition of guide_grid guide_grid_orig <- ggplot:::guide_grid # Create the replacement function guide_grid_no_hline <- function(theme, x.minor, x.major, y.minor, y.major) { ggname("grill", grobTree( theme_render(theme, "panel.background"), theme_render( theme, "panel.grid.minor", name = "x", x = rep(x.minor, each=2), y = rep(0:1, length(x.minor)), id.lengths = rep(2, length(x.minor)) ), theme_render( theme, "panel.grid.major", name = "x", x = rep(x.major, each=2), y = rep(0:1, length(x.major)), id.lengths = rep(2, length(x.major)) ) )) } # Assign the function inside ggplot2 assignInNamespace("guide_grid", guide_grid_no_hline, pos="package:ggplot2") # Draw the graph bp # Restore the original guide_grid function so that it will draw all gridlines again assignInNamespace("guide_grid", guide_grid_orig, pos="package:ggplot2")